
Nested arrays
==============

It would be useful to support the same set of array primitives on nested
vectors/arrays (nested arrays for short) as we do on n-D arrays. Clearly some
operations don't make sense if the nested array is ragged,
e.g. transpose-array. Some operations already work: (tally), (from) in some
cases, ply with verbs of rank -1. Ply with verbs of rank 0 seems well
defined but doesn't work now. Etc.


Sparse arrays
=============

It's simple enough to define some data types to store sparse arrays in whatever
common format (say CSR). As with nested arrays, the interest is in supporting
the same set of array primitives as on n-D arrays. In many cases the complexity
would not be the same, but it should be up to the user to determine the
cost/convenience tradeoff.


Temporaries
===========

As it is, ply and fold will generate many intermediate array temporaries. Some
of them are easy to remove, but others may require more work. Here's a
classification.


1. Function return

These are temporaries returned by cell operations. Removing them requires some
way to transform (op arg ...) -> result functions into (op! result arg ...)
functions. It may be possible to do this automatically by having some convention
on how these functions are declared. It would be harder for general functions
such as (make-verb op #f #f).

This also applies to the returns of fold and ply themselves. Those should
actually be the first to be fixed.


2. No oshape

If a verb doesn't have an oshape, it's necessary to call the cell op at least
once to compute it. Maybe the default op->verb conversion should be (make-verb
op '() #f) instead of (make-verb op #f #f), as an encouragement to provide
oshape. Another possibility is to compute the 1st cell, allocate, and continue
the loop. Then only a cell temporary is required instead of a whole array.


3. Missed loop fusion

This is part of a larger problem. A ply or fold application may not be the final
result of the computation, but an intermediate one, such as when reducing an
outer product.

Another example is when the cell op can be composed with the ply, for example
some selection operators:

  (ply (Jv (cut from <> 0) 1) (i. 2 2)) => (from (i. 2 2) #t 0)

This also shows in the following limitation of amend!. It's cheaper to do

  [1] (ply! (from A i ...) op B ...)

than it is to do

  [2] (amend! A (ply op B ...) i ...)

since [2] allocates a temporary for (ply op B ...) while [1] writes directly
into (from A i ...). However, [1] is bad practice because depending on the
values of i ..., (from A i ...) may not be a shared array over A, so the
statement would not amend A in those cases. On the other hand, amend! will
amend A for any valid i ...  . Loop fusion should remove the temporary in [2]
and allow it to be as efficient as [1] in every case.


Implementations of rank above the rank of the verb
==================================================

Sometimes it's convenient to implement a verb of rank-k as a function that takes
arrays of higher rank. This is normally done for efficiency reasons when the
function is implemented in a foreign module. For example, if a function
(rotate-vector v) is implemented in C, it makes sense for the function
to take a whole block of vectors at once, instead of relying on the rank
extension mechanism to call it for each 1-slice in this block.

The verb definition for verbs of rank k should support using lambas of rank
q≥k. When the argument's rank is in [k, q], it suffices to reshape in and out of
the call. When the argument's rank is >q it may not be possible to reshape
without copying, depending on the strides of the argument. Therefore the
argument traversal should use the lambda rank q instead of the verb rank k.
